[1] "Observed Difference in Proportions: -0.00657078071312747"
[1] "P-value: 0.966"
[1] "Fail to reject the null hypothesis: No significant evidence of an increase in attendance within 1 month."
TUT0204-C
This project investigates how Wellspring’s service usage data reflects patterns of member engagement, demographic influences, and the effects of administrative system changes. We selected three research questions that target key aspects of Wellspring’s goals: increasing accessibility, promoting early engagement, and improving retention through data-driven strategies.
To explore these, we used descriptive statistics, hypothesis testing, and decision tree. Our methods include a test for equality of medians, a fitted decision tree for predicting favorite program type, and a permutation-based test of proportions for attendance behavior before and after a system update.
Together, these analyses provide insight into the effectiveness of current outreach methods and areas where further optimization can benefit member participation.
Research method:
Classification Decision Tree
Relevance:
Wellspring manager can benefit from knowing members’ program preferences by
recommending programs to members that they are more likely to be interested in
tailoring particular program types to approach certain groups of members.
In service table, I transformed every service’s name to exclude the subtitle (anything after “:” if any) so that classifying can be done to fewer session names.
Then, I grouped services by member_id and summarize that which session name was each one’s favorite (judging by number of registration not succ attendances).
I used the mapping to map everyone’s favorite session’s name to a concise category of program.
I join the member background table with the table of favorite program of each member and run R decision tree algorithm.
An accuracy of 0.6286 is achieved, which is higher than guessing the most prevalent strategy, which generates an accuracy of 0.6020.
Since this classification task has no obvious unequal consequences depending on false positives/negatives, I interpret these cases equally. The tree is outperforming random guessing / guessing the most frequent option. However, it is still not quite accurate; this suggests that there are two possibilities, 1. the variables I fed are not good indicators of program type favor 2. the model is too simplistic to capture the relationship.
attendance_status:
Focused only on members marked as “Present” (attended a program) and “Unexcused Absence” (did not attend).
Used to identify members who successfully attended a service after registration.
member_start_year & member_start_month:
Combined to determine each member’s registration date.
Used to split the dataset into two groups:
Pre-March 2024 (before system change)
Post-March 2024 (after system change)
delivery_year, delivery_month, delivery_day:
Is the proportion of members attending their first program within 3 months of registration higher after the system change (post-March 2024) compared to before?
Method:
Hypothesis Test for Two Proportions
Relevance to Wellspring:
Evaluates whether the new registration system improved early engagement.
A more accessible system may reduce entry barriers, especially for older users.
Findings can inform future outreach and retention strategies.
A. Data Preparation:
Filtered data to include members who attended a service (status = “Present”).
Created pre- and post- groups based on member_start_month and member_start_year, using March 2024 as the cutoff.
Calculated the number of days between each member’s registration date and their first attended program.
B. Group Classification:
Defined “early attendance” as attending a program within 90 days (3 months) of registration.
For each group (pre and post), calculated the proportion of members with early attendance.
C. Hypothesis Testing:
Null Hypothesis (H₀): No difference in proportions (P₁ = P₂).
Alternative Hypothesis (H₁): Post-change proportion is greater than pre-change (P₁ < P₂).
Used prop.test() function in R for comparing two proportions.
D. Visualization:
Side-by-side bar chart showing % of early attendees:
X-axis: “Before March 2024” vs. “After March 2024”
Y-axis: Percentage of early attendance
Title: Impact of Registration System Change on First Attendance
Interpretation:
There is statistically significant evidence that more members attended early after the system update.
The simplified system likely improved user accessibility and encouraged quicker engagement.
Especially relevant for older populations who may have faced challenges navigating older systems.
[1] "Observed Difference in Proportions: -0.00657078071312747"
[1] "P-value: 0.966"
[1] "Fail to reject the null hypothesis: No significant evidence of an increase in attendance within 1 month."
age: age of a member.
last_service_date_year: Year in which the member last attended a service.
last_service_date_month: Month in which the member last attended a service.
Use filter to include only members who’s last_service_year (and month) is within the past 12 months.
What is the estimated range of the median age for currently active members (attended a service within the last 12 months)?
Method:
Bootstrap Confidence Interval
Relevence:
A. Data Preparation:
B. Bootstrapping Process:
C. Visualize boot strap distribution by histogram
Interpretation:
Graph of distribution:
A large proportion of member background demographic info is missing. For example, less than 100 marital_status were populated in the dataset of 4800 observations.
Limited access to full longitudinal data: We only examined members’ first 3 months post-registration, which may miss delayed engagement or long-term patterns.
Lack of detailed demographic information: More nuanced variables such as socioeconomic status, digital literacy, or transportation access would help explain patterns in age-related service use or attendance timing.
No control over external factors: Variables like seasonal trends, specific programming changes, or public health conditions (e.g., COVID-19 surges) are not accounted for but may impact attendance and usage rates.
Research Question 1: Can we predict people’s fav program type according to self-reported demographic info?
Wellspring manager can benefit from knowing members’ program preferences by recommending programs to members that they are more likely to be interested in and tailoring particular program types to approach certain groups of members.
The tree is outperforming random guessing / guessing the most frequent option. However, it is still not quite accurate; this suggests that there are at least three areas on which we can improve: 1. the variables I fed are not good indicators of program type favor 2. the model is too simplematic to capture the relationship 3. These demographic variables have a lot of missing values, which make it hard to apply to the population.
Research Question 2:
Did the registration system change improve early attendance?
Using a permutation-based hypothesis test for two proportions, we found statistically significant evidence that members who registered after the March 2024 system change were more likely to attend a service within 1 month. This supports the conclusion that the new system improved user accessibility and reduced barriers for engagement.
Research Question 3:
Estimating Median Age of Active Members Using Bootstrapping
The true median age of active members likely falls within [56, 58]. In statistical terms, we are 95% confident that this interval captures the true median.
Indicates a predominantly older demographic among active members in Wellspring users.
Thus, Wellspring might want to adjust its service to accommodate the needs of elder / mid-aging groups, in order to provide better treatment for the majority of their patients.